State space time series clustering using discrepancies based on the Kullback-Leibler information and the Mahalanobis distance

نویسندگان

  • Eric D. Foster
  • Eric D Foster
چکیده

In this thesis, we consider the clustering of time series data; specifically, time series that can be modeled in the state space framework. Of primary focus is the pairwise discrepancy between two state space time series. The state space model can be formulated in terms of two equations: the state equation, based on a latent process, and the observation equation. Because the unobserved state process is often of interest, we develop discrepancy measures based on the estimated version of the state process. We compare these measures to discrepancies based on the observed data. In all, seven novel discrepancies are formulated. First, discrepancies derived from Kullback-Leibler (KL) information and Mahalanobis distance (MD) measures are proposed based on the observed data. Next, KL information and MD discrepancies are formulated based on the composite marginal contributions of the smoothed estimates of the unobserved state process. Furthermore, an MD is created based on the joint contributions of the collection of smoothed estimates of the unobserved state process. The cross trajectory distance, a discrepancy heavily influenced by both observed and smoothed data, is proposed as well as a Euclidean distance based on the smoothed state estimates. The performance of these seven novel discrepancies is compared to the often used Euclidean distance based on the observed data, as well as a KL information discrepancy based on the joint contributions of the collection of smoothed state estimates (Bengtsson and Cavanaugh, 2008). We find that those discrepancy measures based on the smoothed estimates of the unobserved state process outperform those discrepancy measures based on the observed data. The best performance was achieved by the discrepancies founded upon the joint contributions of the collection of unobserved states, followed by the discrepancies derived from the marginal contributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminant Analysis for ARMA Models Based on Divergency Criterion: A Frequency Domain Approach

The extension of classical analysis to time series data is the basic problem faced in many fields, such as engineering, economic and medicine. The main objective of discriminant time series analysis is to examine how far it is possible to distinguish between various groups. There are two situations to be considered in the linear time series models. Firstly when the main discriminatory informati...

متن کامل

Using Kullback-Leibler distance for performance evaluation of search designs

This paper considers the search problem, introduced by Srivastava cite{Sr}. This is a model discrimination problem. In the context of search linear models, discrimination ability of search designs has been studied by several researchers. Some criteria have been developed to measure this capability, however, they are restricted in a sense of being able to work for searching only one possibl...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

Model Confidence Set Based on Kullback-Leibler Divergence Distance

Consider the problem of estimating true density, h(.) based upon a random sample X1,…, Xn. In general, h(.)is approximated using an appropriate in some sense, see below) model fƟ(x). This article using Vuong's (1989) test along with a collection of k(> 2) non-nested models constructs a set of appropriate models, say model confidence set, for unknown model h(.).Application of such confide...

متن کامل

On Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces

Statistical distance measures have found wide applicability in information retrieval tasks that typically involve high dimensional datasets. In order to reduce the storage space and ensure efficient performance of queries, dimensionality reduction while preserving the inter-point similarity is highly desirable. In this paper, we investigate various statistical distance measures from the point o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016